# Dynamic Voltage and Frequency Management for a Low-Power Embedded Microprocessor

Masakatsu Nakai, Satoshi Akui, Katsunori Seno, Tetsumasa Meguro, Takahiro Seki, Tetsuo Kondo, Akihiko Hashiguchi, Hirokazu Kawahara, Kazuo Kumano, and Masayuki Shimura

Abstract-High-performance and low-power microprocessors are key to PDA applications. In this paper, a dynamic voltage and frequency management (DVFM) scheme with leakage power compensation effect is introduced in a microprocessor with 128-bit wideband 64-Mb embedded DRAM. The DVFM scheme autonomously controls clock frequency from 8 to 123 MHz in steps of 0.5 MHz and also adaptively controls supply voltage from 0.9 to 1.6 V in steps of 5 mV, achieving 82% power reduction in Personal Information Management scheduler application and 40% power reduction in MPEG4 movie playback. This low-power embedded microprocessor, fabricated with 0.18- $\mu$ m CMOS embedded DRAM technology, enables high-performance operations such as audio and video applications. As process technology shrinks, this adaptive leakage power compensation scheme will become more important in realizing high-performance and low-power mobile consumer applications.

Index Terms—Delay synthesizer, dynamic frequency control (DFC), dynamic voltage and frequency management (DVFM), dynamic voltage control (DVC), embedded DRAM, leakage compensation, wideband bus architecture.

#### I. INTRODUCTION

YNAMIC voltage and/or frequency control schemes have been reported in [1]–[5]. Our approach offers both dynamic frequency control (DFC) and dynamic voltage control (DVC). Clock frequency is autonomously and dynamically controlled while supply voltage is adaptively controlled resulting in the leakage power compensation effect. This dynamic voltage and frequency management (DVFM) approach achieved 82% power reduction in a Personal Information Management (PIM) application.

Handheld audio and video applications require high-performance and low-power processor hardware. In the case of a multi-application product such as a PDA, performance and power requirements vary widely, depending on the application being run. For example, the targeted power consumption for a movie application is typically 250 mW, 75 mW for an audio application, 50 mW for a schedule application, and 3 mW for standby mode.

As process technology shrinks, variation among chips' characteristics becomes larger. The fluctuations in operation speed

Manuscript received April 15, 2004; revised July 30, 2004.

M. Nakai, S. Akui, K. Seno, T. Meguro, T. Seki, T. Kondo, A. Hashiguchi, K. Kumano, and M. Shimura are with Semiconductor Solutions Network Company, Sony Corporation, Shinagawa-ku, Tokyo 140-0002, Japan (e-mail: nakai@sldc.sony.co.jp).

H. Kawahara is with IT & Mobile Solutions Network Company, Sony Corporation, Shinagawa-ku, Tokyo 140-0001, Japan.

Digital Object Identifier 10.1109/JSSC.2004.838021

and power consumption also become serious. Also, the increase of power consumption by the subthreshold leakage is a critical problem for battery-driven devices. Research activity is intensive in this area [6], [7].

General methods of power reduction are voltage scaling and lowering the operating clock frequency. In our DVFM approach, clock frequency is autonomously and dynamically controlled while voltage is adaptively controlled at the same time. A delay synthesizer in the DVC circuit emulates and provides the circuit delay information while the DFC circuit determines optimum operating frequency for the microprocessor to perform desired functions efficiently. To lower the operating frequency, this microprocessor incorporates a 2-D graphics engine, a DSP core, and a 128-bit wideband bus architecture with 64-Mb of embedded DRAM.

Section II explains the design concepts used to simultaneously achieve high-performance and low-power consumption. Section III describes in detail the techniques used in our DVFM scheme to achieve low power consumption and leakage-compensation effect. Section IV reports the magnitude of power reduction achieved through use of the DVFM. Finally, Section V summarizes this work.

#### II. DESIGN CONCEPT

Fig. 1 shows a block diagram of this microprocessor. It has four 128-bit data-width 16-Mb embedded DRAM macros. The processor blocks are connected to the embedded DRAM by a 128-bit bus for high memory bandwidth requirements. Other devices are connected via a bus bridge to a 32-bit bus. A CPU and many peripheral blocks are connected by the 32-bit CPU bus. DSP and audio IF block for audio applications are connected via shared memory to the CPU bus. Embedded DRAM, wideband bus architecture and some hardware engines like a 2-D graphics engine and DSP not only improve audio/video performance, but also significantly reduce power consumption by reducing the required clock frequency and input/output power. The 2-D graphics engine executes image processing such as picture size conversion, efficiently. If embedded DRAM is not used, the microprocessor has to communicate with the external memory. In this case, power consumption is increased as a result of driving the external memory interface pins. Furthermore, embedded DRAM with wideband bus has higher data transfer rate than external memory, so the processor can operate at a lower frequency and supply voltage. The maximum data transfer rate of the wideband bus is 7.86 GB/s.

The DVFM block, clock generator and dc-dc IF block control the supply voltage and clock frequency. The supply voltage



Fig. 1. LSI block diagram.



Fig. 2. Die microphotograph.

of the DRAM macros and phase-locked loops (PLLs) is fixed to 1.6 V, but the supply voltage of other logic circuit blocks is controlled dynamically by the DVFM in the range between 0.9 and 1.6 V. Peripheral circuit blocks that communicate with external devices are driven by fixed frequency clocks, but all other circuit blocks are driven by the dynamically controlled system clock. In this implementation the system clock is set between 8 and 123 MHz with a control resolution of 0.5 MHz. The embedded DRAM macros and an external SDRAM are also driven by the same system clock. The DVFM circuit block controls supply voltage and clock frequency for a large part of this LSI.

TABLE I LSI SPECIFICATION

| Technology | 0.18μm 5-Metal CMOS                   |
|------------|---------------------------------------|
|            | (Embedded DRAM)                       |
| Voltage    | 3.3V, 2.5V(IO, Analog)                |
|            | 1.6V(OSC, PLL)                        |
|            | 1.6V, 3.2V(eDRAM, Word line)          |
|            | 0.9V-1.6V Dynamic Change (Logic)      |
| Clock      | 8-123MHz(CPU,CPU Bus,128bit Bus,etc.) |
|            | 96, 48, 38.4, 24, 19.2, 22.5792       |
|            | 14.769MHz, etc. (Peripherals)         |
| Logic      | 1.35Million Gate                      |
| eDRAM      | 64Mbit                                |
| SRAM/ROM   | 1.3Mbit/20kbit                        |
| Die Size   | 10.93mm x 13.18mm                     |
| Package    | 368Pin LFLGA                          |
|            |                                       |

Therefore, our DVFM scheme is highly effective in realizing ultra-low power consumption.

Fig. 2 and Table I show the microphotograph and specification of the microprocessor. The chip is fabricated using a 0.18- $\mu$ m 5-metal CMOS embedded DRAM process. The logic circuit gate count is 1.35 million and the chip size is 10.93 mm by 13.18 mm. The logic part of microprocessor is designed using standard cells, except the four embedded DRAM macros, analog cells like the PLL, and a delay synthesizer circuit which emulates the critical paths of the LSI.

## III. DYNAMIC VOLTAGE AND FREQUENCY MANAGEMENT SCHEME

Fig. 3 shows the DVFM block which consists of the DFC and DVC units. The DVC emulates the critical-path characteristic



Fig. 3. DVFM block diagram.



Fig. 4. Delay measurement.

using a delay synthesizer and controls the dynamic supply voltage. The DFC controls the clock frequency at the required minimum value by monitoring LSI activity autonomously. The details of each block are explained below.

#### A. Dynamic Voltage Control (DVC)

The DVC block consists of three major parts: the pulse generator, the delay synthesizer, and the delay detector. The pulse generator creates a detect pulse signal and also a detect clock signal as shown in Fig. 4. The detect pulse propagates through the delay synthesizer and reaches the delay detector. The delay detector consists of a delay line gauge and flip-flops. The flip-flops capture the signal from the delay synthesizer at the positive edge of the detect clock and digitizes the delay propagation. By comparing the digitized delay value with the target value, the delay detector determines whether to increase, decrease, or keep the present supply voltage value. The minimum operating voltage from 0.9 to 1.6 V at 5-mV step is supplied in real time by controlling off-chip dc-dc converter to adjust the digitized value

to target, thus maintaining LSI operation correctly at the given clock frequency. Because the delay line gauge with 5-bit output needs to detect even small delay fluctuation, each of the 32 delay line elements consists of two inverters, resulting in better sensitivity, >6 mV/digit. When supply voltage increases by 6 mV, the delay line gauge's output increases a digit. Since the delay line gauge has better sensitivity, 5-mV step voltage control is accomplished. To attain the stable control, the equivalent voltage resolution of the delay line gauge must be larger than the supply voltage resolution.

Fig. 5 shows the details of the delay synthesizer. The delay characteristic of the actual LSI is composed of not only a simple transistor delay factor, but also wire delay and other factors. The delay synthesizer consists of three programmable delay components, gate delay, RC delay, and a rise/fall delay component. The gate delay component includes two types of NAND gates, one of nominal gate length and another of long gate length. The RC delay component includes wires from each of the four metal layers and its total length is 14 mm. These three components



Fig. 5. Delay synthesizer.

were chosen from analyzed results of some LSI's characteristics. Use of a delay-tracking unit as a method for reproducing the circuit configuration of the critical path has been reported [8]. However, in this technique, the delay-tracking unit has to be designed for every LSI, and it cannot be reused in other LSIs. On the other hand, our delay synthesizer can emulate the critical-path's characteristics by using only three components. Desired delay characteristics can be synthesized by combining those delay factors which are controlled by 6-bit signals. Additionally, by changing these 6-bit control signals, it is also possible to emulate two or more critical paths by time-sharing. This delay synthesizer module block has the reusability as IP, and can be used in other LSI chips with the same process generation.

Fig. 6 shows the accuracy of the delay synthesizer in tracking the main logic delay. Fig. 6(a) shows the frequency-voltage characteristic with delay modeled using only nominal gate-length NAND gates. Fig. 6(b) shows the frequency-voltage characteristic achieved with our multi-component delay synthesizer circuit. Each paired line shows process deviation in Fig. 6. The solid line shows emulation characteristic, the dashed line shows the critical path's characteristic. The real LSI characteristics cannot be emulated by using only nominal gate delay without the delay synthesizer, as shown in Fig. 6(a). On the other hand, the delay synthesizer can track well within 4% voltage accuracy over the full range of process deviation and voltage, as shown in Fig. 6(b). The relation between the transistor current and the delay characteristics of each delay synthesizer's components is modeled in order to determine the parameters of the delay synthesizer. Also the characteristic of the critical path is modeled similarly. To extract the critical path's characteristic, the test vectors, which make power consumption maximum, are chosen as the critical-path access that is derived from timing analysis. Then the parameters are decided to fit the model of the critical path. The delay percentage with the parameters in the silicon were nominal gate delay 25%, long gate delay 38%, RC delay 25%, and rise/fall delay 12%. These parameters are fixed for all chips in actual manufacturing.

The most notable feature of the DVC is its ability to adapt itself to compensate for leakage power over process deviation that

becomes more pronounced as process technology shrinks. Fig. 7 shows the leakage-compensation effect achieved by using DVC. In this figure, the horizontal axis shows process deviation while the vertical axis shows measured power consumption. The solid line shows total power consumption which includes dynamic power and leakage power. The dashed line shows dynamic power consumption only. The dynamic power is constant under process deviation because supply voltage is constant. The leakage and total power increase as the threshold voltage  $(V_{\rm th})$ becomes lower. The DVC detects the fluctuation of circuit delay due to the process deviation, and adaptively controls the supply voltage to maintain the efficient LSI operation. When the  $V_{\rm th}$  becomes lower, DVC reduces supply voltage, so the maximum total power consumption decreases. That is, our DVC adaptively compensates leakage power by minimizing supply voltage according to the process deviation and temperature fluctuation.

## B. Dynamic Frequency Control (DFC)

The DFC consists of an activity monitor and a frequency adjuster as shown in Fig. 3. The DFC block controls the clock frequency at the required minimum value autonomously in hardware without special power-management software. The activity monitor calculates the total LSI activity periodically from activity information of embedded DRAM, bus, and CPU. The frequency adjuster circuit unit calculates the optimum clock frequency based on the activity value derived from the activity monitor to reserve the required number of inactive margin cycles within the monitoring period and indicates the next clock frequency to the clock generator. Fig. 8 shows the details of the activity monitor. The activity monitor counts the maximum value of activities from the CPU, the Bus Ctl, and the eDRAM Ctl at an arbitrary period that can be set via software.

Fig. 9 shows the frequency decision flow. In this figure, "Act" stands for the effective frequency to ensure proper operation over the current monitoring period, "Margin" the clock frequency margin to ensure proper operation, and "Step" the minimum step value for the controlled frequency. The clock frequency of the next monitoring period is a function of the





Fig. 6. Tracking characteristic.



Fig. 7. Leakage-compensation effect.



Fig. 8. Activity monitor.

activity information of the current monitoring period and the margin setting. At the end of each monitoring period, the DFC



Fig. 9. Frequency decision flow.

uses the activity data to determine the required minimum clock frequency. When the clock frequency of the monitoring period just ending is less than the value of Act plus Margin, the clock frequency of the next monitoring period is increased by the Step value. On the other hand, if the current clock frequency minus the Step value is greater than the value of Act plus Margin, the next frequency is decreased by the Step value. Otherwise, the clock frequency is held constant. In actual application, normally the value of Margin is set to 2-3 MHz, and it can be changed via software. To guarantee the proper operation of applications with real-time performance requirements, it is possible to set a lower limit on the system clock frequency via software. The clock frequency may also be set directly via software to allow abrupt performance change in response to external events.

### C. Frequency and Voltage Control Scenarios

Fig. 10 describes the operation of the DVFM. When the clock frequency is required to increase, the DVC reference clock is switched to the next higher frequency in advance and the DFC Authorized licensed use limited to: TU Bergakademie Freiberg. Downloaded on May 04,2024 at 19:30:27 UTC from IEEE Xplore. Restrictions apply.



Fig. 10. Frequency and voltage control scenarios in DVFM and system clock transition.



Fig. 11. Clock thinning circuit.

directs the dc-dc converter to raise the supply voltage rapidly at the point (1) in Fig. 10(a). The main logic clock frequency is changed after the DVC confirms the voltage has increased enough at the point (2) in Fig. 10(a). The feedback loop of the DVC maintains the supply voltage at the minimum level needed for device operation at current clock frequency. When the clock frequency is lowered, both the DVC reference clock and the system clock are changed simultaneously and at the same time, the supply voltage starts to decrease at the point (3) in Fig. 10(a). Since the DVC can act as the monitor of internal voltage, the safe control of the voltage and the frequency can be made easily. Note that the thinned-out clock must not be supplied to the DVC. Fig. 10(c) shows a measurement of a system clock transition from 24.5 to 48.5 MHz along with the system clock generation scheme. In this case, these thinned-out system clock frequencies of both 24.5 and 48.5 MHz were generated from 32 and 64 MHz base clocks, respectively. Fig. 11 shows the clock thinning circuit that consists of counter, decoder, comparator and clock enabler. In this figure, the "maxd" is base clock frequency and the "compb" is degree of thinning. For example, when 32 MHz clock is chosen and target frequency is 28 MHz, maxd is set to

32, and cmpb is set to 4. The decoder carries out equalization of the thinning period. The system clock is selected from the eight base clocks between 8 and 123 MHz, then thinned out to attain the frequency between them and is switched seamlessly from frequency to frequency as shown in Fig. 10(b). Therefore, the LSI can operate continuously without PLL relock or system reset when the clock frequency changes.

#### IV. POWER REDUCTION EFFECT

This section describes the power reduction effect of DVFM. Fig. 12(a) shows the power consumption in the case of MPEG4 playback. In the conventional design technique that does not use wideband bus architecture, the power consumption is 741 mW. It cannot be satisfied of the power budget as explained in Section I. The power consumption in the input/output becomes considerably reduced by introducing wideband embedded DRAM architecture. The power was reduced by 53%. Moreover, the DVFM optimizes the operating frequency and the supply voltage, resulting in the final power consumption of 210 mW, enabling long movie playback on a portable device.

# MPEG4 Decoding at 15fps, 216kbps and AAC 64kbps

#### Wideband 800 **Architecture** 741mW 700 Measured Power [mW] 600 Logic 500 1.6V 200MHz Adaptive **Power** 400 Management 350mW 300 Logic 210mW 200 I/O 1.6V Logic 23MHz 100 1/0 1/0 DRAM eDRAM eDRAM This This Conventional Work Work Design DVFM=ON (DRAM Outside) DVFM=OFF (a)

#### PIM (scheduler)



Fig. 12. Comparison of power consumption.

Efficiency loss of the external dc-dc converter is not included in this value.

Fig. 12(b) is the case of the PIM application. The power consumption in the input/output is smaller than in case of MPEG4 application because of less access to the memory blocks. Also, high-frequency operation is not needed for PIM application. At this situation, DVFM detects small activity from the hardware and, as a result, lowers the operating clock frequency to the minimum allowable value. Furthermore, the power consumption can be reduced to 45 mW by voltage optimization. Finally, 83% power reduction is achieved.

#### V. CONCLUSION

The DVFM scheme with leakage compensation was introduced. Our DVFM scheme autonomously controls the clock frequency from 8 to 123 MHz in steps of 0.5 MHz and adaptively controls voltage from 0.9 to 1.6 V in steps of 5-mV resolution. Our delay synthesizer can track well within 4% voltage accuracy over the full range of process deviation and voltage. The system clock is switched seamlessly from frequency to frequency. Therefore, the LSI can operate continuously without PLL relock or system reset when the clock frequency changes.

In a PIM application for a low-power embedded microprocessor, 82% power reduction was achieved. As process technology shrinks, this adaptive leakage-compensation scheme will become more important in realizing high-performance and low-power mobile consumer applications.

### ACKNOWLEDGMENT

The authors thank M. Soneda, S. Amano, M. Miyabayashi, Y. Amagasaki, S. Tejima, C. Samwald, and Y. Hagiwara for help, suggestions, and support.

#### REFERENCES

- P. Macken et al., "A voltage reduction technique for digital systems," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 1990, pp. 238–239.
- [2] K. Suzuki et al., "A 300 MIPS/W RISC core processor with variable supply-voltage scheme in variable threshold-voltage CMOS," in Proc. IEEE Custom Integrated Circuits Conf. (CICC), May 1997, pp. 587–590.
- [3] S. Sakiyama et al., "A lean power management technique: The lowest power consumption for the given operating speed of LSIs," in Symp. VLSI Circuits Dig. Tech. Papers, June 1997, pp. 99–100.
- [4] T. Burd et al., "A dynamic voltage scaled microprocessor system," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2000, pp. 294–295.
- [5] K. Nowka et al., "A 0.9 V to 1.95 V dynamic voltage-scalable and frequency-scalable 32 b PowerPC processor," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2002, pp. 340–341.
- [6] K. Nose et al., "V<sub>th</sub>-hopping scheme to reduce subthreshold leakage for low-power processors," *IEEE J. Solid-State Circuits*, vol. 37, no. 3, pp. 413–419, Mar. 2002.
- [7] J. Tschanz et al., "Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1396–1402, Nov. 2002.
- [8] M. Dean, "STRiP: A Self-Timed RISC Processor," Ph.D. thesis, Dept. Electr. Eng., Stanford Univ., Stanford, CA, June 1992.



Masakatsu Nakai received the B.S. and M.S. degrees in electronics engineering from Tokai University, Tokyo, Japan, in 1988 and 1990, respectively.

In 1990, he joined Sony Corporation, Tokyo, Japan, where he developed single chip microcontrollers, memories, and analog circuits. Since 1999, he has been engaged in research and development of low-power circuits and techniques for CMOS LSI.



Tetsuo Kondo received the B.S. degree in engineering from Shizuoka University, Shizuoka, Japan.

He joined Sony Corporation, Tokyo, Japan, in 1998, where he has been engaged in work on the dynamic voltage and frequency management technique and the low-power and reconfigurable processor.



Satoshi Akui received the B.S. and M.S degrees from the Tokyo Institute of Technology, Tokyo, Japan, in 1986 and 1988, respectively.

He joined Sony Corporation, Tokyo, Japan, in 1988. Since then, he has been engaged in research and development of new design methodologies. Since 2000, he has been engaged in research and development of mobile application processors.



Akihiko Hashiguchi received the B.S. and M.S. degrees in energy conversion engineering from Kyusyu University, Japan, in 1998 and 1990, respectively.

In 1990, he joined Sony Corporation, Tokyo, Japan, where he designed DRAM. From 1993 to 1998, he was involved in the research of the Video DSP. From 1999, he researched the low power circuit and systems. Now, he is engaged in the development of sub-100-nm low-power LSI and the embedded SRAM.

Mr. Hashiguchi has been a member of the IEICE Electrical Society Technical Committee on Integrated Circuits and Devices since 2001.



Katsunori Seno received the B.S. and M.S. degrees in electrical engineering from Waseda University, Tokyo, Japan, in 1986 and 1988, respectively.

In 1988, he joined Sony Corporation, where he was engaged in the development of high-speed 4 Mb and 16 Mb CMOS SRAM and MPEG2 video DSP. From 1995 to 1997, he was a Visiting Researcher at the University of California, Berkeley, doing research in the field of low power. He rejoined Sony in 1998. He is currently heading development of low-power dynamic reconfigurable architecture and power man-



Hirokazu Kawahara received the B.S. degree in electrical engineering from the Tokyo University of Science, Japan, in 1984.

He joined Sony Corporation, Tokyo, where he has developed computer workstation systems. At present, he is engaged in development of low-power computer systems for mobile consumer products.



Circuits.

Tetsumasa Meguro received the B.S. and M.S. degrees in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1993 and 1995, respectively.

In 1995, he joined Sony Corporation, Tokyo. He has been involved in the development of Video DSP. He is currently engaged in the development of lowpower LSI design.



Kazuo Kumano received the B.S. degree in electrical engineering from the University of Hiroshima, Hiroshima, Japan.

In 1984, he joined Sony Corporation, Tokyo, Japan, where he has developed CMOS analog LSI circuits and system LSI circuits for cellular phones and PDAs since 1990.



Takahiro Seki received the B.S. and M.S. degrees in electrical engineering from the Tokyo University of Science, Japan, in 1990 and 1992, respectively.

In 1992, he joined Sony Corporation, Japan. He was engaged in research and development of CMOS digital and analog circuits, flash memory, and highspeed serial interface LSI. Since 1998, he has been engaged in research and development of low-power circuits and systems.



Masayuki Shimura received the B.S. degree in physics from Tsukuba University, Ibaraki, Japan.

In 1987, he joined Sony Corporation, Tokyo, Japan. He was involved in the development of image sensor and camera control generator LSI and signal processing LSI for CCD cameras. From 1997 to 2001, he was on loan to Sony LSI Design Inc., where he was engaged in the logic design of MOS LSI for camera and video applications and network I/F, DVD player, DTV, etc. In 2001, he rejoined Sony Corporation where he is currently General Manager

of LSI Design Department for high-speed low-power processor SoC of mobile consumer products.